Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set conntrack params in kube-proxy #19182

Merged

Conversation

thockin
Copy link
Member

@thockin thockin commented Dec 29, 2015

Add flags to control max connections (set to 256k vs 64k default) and TCP
established timeout (set to 1 day vs 5 day default). Flags can be set to 0 to
mean "don't change it".

This is only set at startup, and not wrapped in a rectifier loop.

Tested manually.

Fixes #18604

@k8s-bot
Copy link

k8s-bot commented Dec 29, 2015

GCE e2e build/test failed for commit 68492f066dfa8f75fa322c0495330fc7bad9a470.

@k8s-github-robot
Copy link

Labelling this PR as size/L

@k8s-github-robot k8s-github-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 29, 2015
return ioutil.WriteFile("/sys/module/nf_conntrack/parameters/hashsize", []byte(strconv.Itoa(max/4)), 0640)
}

func (realConntracker) SetTCPEstablishedTimeout(seconds int) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the type be Duration instead of int ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose seconds because any finer granularity is not respected. I could go either way, but given the very limited exposure of this, I think simpler is better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good

@thockin thockin force-pushed the kube-proxy-conntrack-max branch 2 times, most recently from 2eede77 to 407c744 Compare December 30, 2015 19:21
@k8s-bot
Copy link

k8s-bot commented Dec 30, 2015

GCE e2e test build/test passed for commit 407c744906ee962039f8b9a6285bc20e0d458acb.

@k8s-bot
Copy link

k8s-bot commented Dec 30, 2015

GCE e2e test build/test passed for commit 2eede77cfeff21e6c21c6ea05cdeffb8dafcf79a.

@k8s-bot
Copy link

k8s-bot commented Dec 30, 2015

GCE e2e test build/test passed for commit 5263e2638fd29eec00755a42ae88c2abad6a20c6.

Add flags to control max connections (set to 256k vs 64k default) and TCP
established timeout (set to 1 day vs 5 day default).  Flags can be set to 0 to
mean "don't change it".

This is only set at startup, and not wrapped in a rectifier loop.

Tested manually.
@k8s-bot
Copy link

k8s-bot commented Dec 31, 2015

GCE e2e test build/test passed for commit da0ac31.

@thockin
Copy link
Member Author

thockin commented Dec 31, 2015

@k8s-bot unit test this please

2 similar comments
@thockin
Copy link
Member Author

thockin commented Dec 31, 2015

@k8s-bot unit test this please

@thockin
Copy link
Member Author

thockin commented Dec 31, 2015

@k8s-bot unit test this please

@ArtfulCoder ArtfulCoder added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 31, 2015
@ArtfulCoder
Copy link
Contributor

@k8s-bot unit test this please

a unit test seems to have failed. re-running test to confirm

@thockin
Copy link
Member Author

thockin commented Dec 31, 2015

@k8s-bot unit test this please

A different test fails every time. This is intolerable

@thockin
Copy link
Member Author

thockin commented Dec 31, 2015

w00t! Green at last. Green at last! Thank git alrighty it is green at last.'

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@thockin
Copy link
Member Author

thockin commented Dec 31, 2015

@k8s-bot unit test this please

@k8s-bot
Copy link

k8s-bot commented Jan 1, 2016

GCE e2e test build/test passed for commit da0ac31.

@thockin
Copy link
Member Author

thockin commented Jan 1, 2016

@k8s-bot unit test this please

1 similar comment
@thockin
Copy link
Member Author

thockin commented Jan 1, 2016

@k8s-bot unit test this please

@thockin
Copy link
Member Author

thockin commented Jan 1, 2016

This is hopeless - even if I get it to go green, the chances of it being green again for Jenkins is 0.

@k8s-bot unit test this please

@thockin
Copy link
Member Author

thockin commented Jan 1, 2016

@k8s-bot unit test this please

1 similar comment
@wojtek-t
Copy link
Member

wojtek-t commented Jan 2, 2016

@k8s-bot unit test this please

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot
Copy link

k8s-bot commented Jan 2, 2016

GCE e2e test build/test passed for commit da0ac31.

@wojtek-t
Copy link
Member

wojtek-t commented Jan 2, 2016

@k8s-bot unit test this please

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot
Copy link

k8s-bot commented Jan 2, 2016

GCE e2e test build/test passed for commit da0ac31.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

k8s-github-robot pushed a commit that referenced this pull request Jan 2, 2016
@k8s-github-robot k8s-github-robot merged commit 4f65f68 into kubernetes:master Jan 2, 2016
@thockin thockin deleted the kube-proxy-conntrack-max branch January 4, 2016 04:33
@karlkfi
Copy link
Contributor

karlkfi commented Jan 5, 2016

Kubernetes-Mesos CI smoke tests have been failing on master since this was merged. Guess that CI failure up there wasn't just a red herring!

Haven't figured out why this broke the tests, but it didn't just break them, it stopped the JUnit test report from being produced too...

@karlkfi
Copy link
Contributor

karlkfi commented Jan 5, 2016

Mesos smoke tests are fixed by #19277. Kinda stupid that it doesn't work without the new flags specified.

@ghost
Copy link

ghost commented Jan 5, 2016

@thockin @ArtfulCoder sorry to chime in so late on this one. A few words of caution:

  1. From my experience, increasing the size of the conntrack table does not often help matters. An application that initiates an essentially unlimited number connections in parallel (think web crawler) will typically blow this table, whether it's 64K, 256K or even larger. A better approach is to rate limit the outgoing connection establishment packets.
  2. Increasing the conntrack table size requires significant extra kernel memory, so you typically need to tune some other kernel parameters too (I forget which) to make the larger table fit. It might have fitted by pure luck in the particular kernel config that you tested on, but it's quite possible that it will cause other kernels to explode. Distro managers (Redhat, Ubuntu etc) typically tune all the necessary parameters correctly to work in harmony, so messing with individual ones can be dangerous.

}
// TODO: generify this and sysctl to a new sysfs.WriteInt()
glog.Infof("Setting conntrack hashsize to %d", max/4)
return ioutil.WriteFile("/sys/module/nf_conntrack/parameters/hashsize", []byte(strconv.Itoa(max/4)), 0640)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line appears to be the culprit that fails our smoke testing when max is non-zero.
xref mesosphere/kubernetes-mesos#724

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for posterity: the nf_conntrack module doesn't seem to support setting the value of this hashsize parameter for network namespace other than init_net; this is strictly incompatible with our mesos/docker-based testing environment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tracking it - what is the fix? What can we do?

On Wed, Jan 6, 2016 at 8:54 AM, James DeFelice notifications@github.com
wrote:

In cmd/kube-proxy/app/conntrack.go
#19182 (comment)
:

+type Conntracker interface {

  • SetMax(max int) error
  • SetTCPEstablishedTimeout(seconds int) error
    +}

+type realConntracker struct{}
+
+func (realConntracker) SetMax(max int) error {

  • glog.Infof("Setting nf_conntrack_max to %d", max)
  • if err := sysctl.SetSysctl("net/netfilter/nf_conntrack_max", max); err != nil {
  •   return err
    
  • }
  • // TODO: generify this and sysctl to a new sysfs.WriteInt()
  • glog.Infof("Setting conntrack hashsize to %d", max/4)
  • return ioutil.WriteFile("/sys/module/nf_conntrack/parameters/hashsize", []byte(strconv.Itoa(max/4)), 0640)

for posterity: the nf_conntrack module doesn't seem to support setting the
value of this hashsize parameter for network namespace other than init_net;
this is strictly incompatible with our mesos/docker-based testing
environment.


Reply to this email directly or view it on GitHub
https://github.com/kubernetes/kubernetes/pull/19182/files#r48979830.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for k8s-mesos I've disabled these tuning parameters by default (read: zero by default). this fixes our CI environment immediately. users can still tweak them if needed/wanted. short of changing the way hashsize is implemented in the kernel module i'm not sure how else to really "fix" this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More links I collected for the context:

@thockin
Copy link
Member Author

thockin commented Jan 20, 2016

On Tue, Jan 5, 2016 at 9:40 AM, Quinton Hoole notifications@github.com wrote:

@thockin @ArtfulCoder sorry to chime in so late on this one. A few words of caution:

From my experience, increasing the size of the conntrack table does not often help matters. An application that initiates an essentially unlimited number connections in parallel (think web crawler) will typically blow this table, whether it's 64K, 256K or even larger. A better approach is to rate limit the outgoing connection establishment packets.

I don't disagree. UDP is particularly unpleasant because we just have
to leave it to linger.

Increasing the conntrack table size requires significant extra kernel memory, so you typically need to tune some other kernel parameters too (I forget which) to make the larger table fit. It might have fitted by pure luck in the particular kernel config that you tested on, but it's quite possible that it will cause other kernels to explode. Distro managers (Redhat, Ubuntu etc) typically tune all the necessary parameters correctly to work in harmony, so messing with individual ones can be dangerous.

We tune the hash size, too, following best practices, though
apparently this breaks in Mesos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants